Improve Mistral models integration with llama.cpp #14737
Draft
+1,638
−109
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR aims to enhance the integration of Mistral models with llama.cpp by addressing several key issues and introducing new features. Here are the details:
Context
Using mistral-common with llama.cpp
We recommend that users only use the
llama-server
tool with the/completions
route of the server for now, as it is the only one that supports tokens input. We also advise users to setreturn_tokens=True
in their requests to letmistral-common
handle detokenization.Added features
We have added a script to convert Mistral models to GGUF directly from Hugging Face. This script is located at
convert_mistral_to_gguf.py
and can be used to convert Mistral models to GGUF format.We registered the Mistral architecture in llama.cpp to support Mistral models natively. This allows users to use Mistral models with llama.cpp without having to convert them to Hugging Face first.
Known Limitations:
Our approach does not support multimodality:
Also this approach requires users to only use the llama.cpp server with the
/completions
route.Example Code
To get started, install mistral-common using the following command:
(Optional) Convert the model
Launch the mistral-common and llama.cpp servers
Launch the mistral-common server:
Launch the llama.cpp server:
Use the servers
Here is a code snippet demonstrating how to use the new features:
Feedback and Contributions
We believe these changes will significantly improve the integration of Mistral models with llama.cpp and provide a better experience for our users. We welcome any feedback or suggestions to further enhance this integration. Also, as we have few experience in the codebase of llama.cpp, we welcome any help to improve the integration and make sure we respect the codebase and the community.